Data Process step

The Data Process step allows you to manipulate document data within a process, from splitting and combining documents to zipping and unzipping data.

You can define multiple processing steps to perform more than one action on the document data. The processing steps are executed in the order defined in the Data Process step. Each processing step operates on the data output from the previous processing step.

Adding processing steps to a Data Process step

The Data Process step dialog opens after you add a Data Process step to your process or open an existing one.

Under Data Process Properties, click the plus icon to add a processing step.
Choose a Processing Step from the list. The processing step types are described in the sections below.
Configure the processing step according to the type you selected.
Repeat steps 1-3 to add more processing steps.
Click the Move Step Up or Move Step Down icons to change the order of the steps as needed.
Click OK to save your changes.

Processing step types

You can add the following types of processing steps:

BASE64 Decode

Decode document data from Base64 encoding to plain text. Base64 encoding is often used to transmit binary data within XML or JSON messages.

BASE64 Encode

Encode document data from plain text to Base64 encoding. Base64 encoding is often used to transmit binary data within XML or JSON messages.

Character Decode

Decode document data from the selected character set to the system default (typically UTF-8).

For Character Set, you can use any standard Java platform character set:

US-ASCII
ISO-8859-1
UTF-8
UTF-16BE
UTF-16LE
UTF-16
Other available character sets are runtime-dependent.

Character Encode

Encode document data from the system default (typically UTF-8) to the selected character set. You can use any standard Java platform character set described above in Character Decode.

Combine Documents

Combine multiple documents into a single document. The data from each document is appended to the data from the previous document. This combines multiple documents read in from the Start step's Get connector or created as a result of splitting documents in a Data Process step.

The Combine Documents process type can be used with the following profile types:

Flat files

Combine multiple flat file documents into a single document. When you combine flat file data, make sure the individual documents do not contain column headers that invalidate the format.

Free-Form Header - Adds header text to the combined document.
Free-Form Footer - Adds footer text to the combined document.
Headers Option - Includes or excludes headers that might be in the input documents.
- No Column Headers - If selected, it is assumed that the input documents do not contain column headers. The first line in each document is considered to be a record and is added to the combined document.
- Remove First Line as Column Headers - If selected, the first line in each document is considered to be a header. The headers are not put into the combined document.
- Retain First Line as Column Headers - If selected, the first line in each document is considered to be a header. The header appears prior to the data in the combined document.

XML or JSON files

Combine XML or JSON files. When you combine XML or JSON data, the documents must match the selected profile. They are combined based on the selected Combine Element. For XML profiles, you cannot combine by an element that has constraints. (Constraints are used only in legacy XML profiles. New XML profiles use qualifiers and instance identifiers.) You cannot combine by an XML attribute. For JSON profiles, you can use only repeating array elements.

Profile — You must select an XML or JSON profile.
Combine Element — You must select a profile element for combining related records.
Combine Documents Into New Profile — (optional) If selected, you must select the profile into which the documents will be combined.
Combine Into Profile — When combining into a new profile, you must select an XML or JSON profile.
Combine Into Element — When combining into a new profile, you must select a profile element for combining related records.

None Profile Type

Documents are combined without regard to their internal structure.

Custom Scripting

Use custom scripting to perform special processing requirements. You can access and modify the actual document data and tracked properties using JavaScript or Groovy.

A custom script can be inserted inline as part of the processing step, or you can reference a Process Scripting component. Either way, there is no need to compile because the script is ready to execute when you save the process.

Search/Replace

Use regular expressions to search the document for specific strings or characters defined in the search text box. After the data is located, it is replaced with the string or characters defined in the replace text box.

The Search setting determines the search method. By default, it searches incrementally using a "rolling window" buffer, where Characters defines the maximum matching characters. For example, replacing "this" with "that" in "this is a test string" requires a Characters setting of 4 or more. The default and recommended setting is 1,024 characters, matching the buffer cache size on most systems.

You can also search incrementally line by line, or search the entire document at once.

caution

Searching an entire document loads two copies into memory, and Java strings use two bytes per character. Consequently, processing a document requires approximately four times its size in memory; for instance, a 4MB document needs about 16MB of memory.

Boomi-managed runtime clouds allocate 512MB for forked executions. Although technically capable of searching up to 100MB documents, memory fragmentation likely limits the practical maximum to around half that size. Memory fragmentation poses an even greater limitation in local runtimes or clusters.

Split Documents

Split a document into multiple documents by line, or based on a profile element value. Data batches often need splitting for per-record validation and routing. To split data by a database profile, use the batching option in the database operation. The Split Documents process type can be used with the following profile types:

Flat files

You can split flat files by line or by profile. Splitting by line splits each line or record into a separate document, which is useful for breaking up large files into smaller pieces. For example, you might want to split one document with 5,000 records into 10 documents with 500 records each. If you split a large document before a Map step, you can improve performance.

Splitting by profile splits lines or records based on the unique value of a profile element or link element. The files you want to split must match the profile you select and contain the selected link element. Records with the same value for the link element are put in the same document.

The links you want to link do not need to be consecutive in the source document. The Data Process step searches through the entire document and groups matching lines in the output document(s).

note

If you have multiple input documents, add a Combine Documents processing step before the Split Documents processing step.

Batch Count — Specifies the number of lines per document. The default is 0, which means that batching is turned off, and each line splits into a separate document. (This is the same as setting the batch count to 1.)
Headers Option - (Split by Line) Determines how you want any column headers to be handled when splitting the file.
- No Column Headers — The input document does not contain column headers. The first line is considered to be a record and is put into the first document.
- Remove First Line as Column Headers — The first line is considered to be a header, not a record, and is not included in any of the documents.
- Retain First Line as Column Headers — The first line is considered to be a header, not a record, and is placed at the top of each document.
Keep Headers - (Split by Profile) If checked, a header is included at the top of each document.
Profile - (Split by Profile) Select a flat file profile. If no profile is selected, the process will fail.
Link Element - (Split by Profile) Select a profile element for linking related records. If no link element is selected, the process will fail.

XML or JSON

You can split XML or JSON files by profile. The files you want to split must match the profile you select and contain the selected split element.

Profile Select an XML or JSON profile.
Split Element - Select a profile element for splitting related records. You can split by any XML or JSON profile element except the root node.
Batch Count - Specifies the number of lines per document. The default is 0, which means that batching is turned off, and each line splits into a separate document. (This is the same as setting the batch count to 1.)

The structure of the XML or JSON file and the split element determine whether the Batch Count field is honored or ignored.

XML files can be split in two ways:

If there are peer elements along the path to the split element, the "non-batch" split method is used and the Batch Count is ignored.
If there are no peer elements along the path to the split element, and the split element is not an attribute and does not have any XML constraints, the "batch" split method is used and the Batch Count is honored.

JSON files can be split in two ways:

If there are peer elements along the path to the split element, or the split element is an absolute array element or an object element, then the “non-batch” split method is used and the Batch Count is ignored.
If there are no peer elements along the path to the split element and the split element is a repeating array element, then the “batch” split method is used and the Batch Count is honored.

EDI

EDI data can be split into separate documents by a segment or data element, with each link element instance and its data becoming a distinct document. Unlike Flat File splitting, EDI splitting does not group records by the link/split element value. When splitting by EDI, you must select an EDI profile and a segment or data element.

Modifying any elements within the profile resets the value of the link/split element, regardless of which specific element has been modified or linked.

Map JSON to Multipart Form Data MIME

Convert an input JSON document containing simple form data and file attachments into a multipart/form-data MIME document.

To create a MIME document with file attachments from a JSON, specify a Document Cache and Index to retrieve attachments. The JSON must include headers for the MIME document and type elements to distinguish between data and attachments. The element name of the attachment acts as the retrieval key from the Document Cache.

When the JSON document's Type element has a "key" value, a Content-Type element specifying the document's file or data type is required. If Content-Type is not text/plain, the "key" Type is treated as a file attachment. If the Type element has a "data" value, the Data Process step assumes text/plain data, and Content-Type is not needed.

note

When the 'Type' is a file reference and no 'Content-Transfer-Encoding' is specified, a default value is automatically added according to MIME RFC standards. Acceptable values for 'Content-Transfer-Encoding' include "7bit", "8bit", "binary", "quoted-printable", "base64", ietf-token, and x-token. For more information, refer to Multipurpose Internet Mail Extensions (MIME) RFC Standards.

JSON input document example

{
  "FirstName" : {
    "value" : "John",
    "Content-Type" : "text/plain; charset=ISO-8859-1",
    "Content-Transfer-Encoding" : "8bit",
    "type" : "data"
    },
  "LastName" : {
    "value" : "Doe",
    "Content-Type" : "text/plain",
    "Content-Transfer-Encoding" : "8bit",
    "type" : "data"
    },
  "Email" : {
    "value" : "johndoe@boomi.com",
    "Content-Type" : "text/plain; charset=ISO-8859-1",
    "Content-Transfer-Encoding" : "8bit",
    "type" : "data"
    },
  "Phone" : {
    "value" : "6105555555",
    "Content-Type" : "text/plain; charset=ISO-8859-1",
    "Content-Transfer-Encoding" : "8bit",
    "type" : "data"
    },
  "resumeAttachment" : {
    "value" : "resume.docx",
    "Content-Type" : "application/document",
    "Content-Transfer-Encoding" : "binary",
    "type" : "key"
    }
}

JSON to multipart/form-data MIME output document example

Content-Type: multipart/form-data; boundary=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

--xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Disposition: form-data; name="FirstName"
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

John

--xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Disposition: form-data; name="LastName"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit

Doe

--xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Disposition: form-data; name="Email"
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

johndoe@boomi.com

--xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Disposition: form-data; name="Phone"
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

610555555

--xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Disposition: form-data; name="resume Attachment" filename="resume.docx"
Content-Type: application/vnd.openxmlformats-
officedocument.wordprocessingml.document
Content-Transfer-Encoding: binary

t4U«¨ªÚéãkü@×&™ì¾i ­ õO,îã\s8VŸÞ9›(*Eâø3Ï™eJÅ>q~½¾Lcg

--xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx--

Map Multipart Form Data MIME to JSON

Transform a multipart/form-data MIME input into a single JSON output. Simple data and attachment files are mapped into JSON objects, differentiated by a "type" element. Attachment files, using their form name as the key, are stored in the Document Cache.

Multipart/form-data MIME input document example

Content-Type: multipart/form-data; boundary=Y-BFLHmahNxCutH0ijbUpI9_csBSF9fVt672JxIh

--Y-BFLHmahNxCutH0ijbUpI9_csBSF9fVt672JxIh
Content-Disposition: form-data; name="FirstName"
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

John
--Y-BFLHmahNxCutH0ijbUpI9_csBSF9fVt672JxIh
Content-Disposition: form-data; name="LastName"
Content-Type: text/plain
Content-Transfer-Encoding: 8bit

Doe
--Y-BFLHmahNxCutH0ijbUpI9_csBSF9fVt672JxIh
Content-Disposition: form-data; name="Email"
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

johndoe@boomi.com
--Y-BFLHmahNxCutH0ijbUpI9_csBSF9fVt672JxIh
Content-Disposition: form-data; name="Phone"
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

6105555555
--Y-BFLHmahNxCutH0ijbUpI9_csBSF9fVt672JxIh
Content-Disposition: form-data; name="resumeAttachment" filename="resume.docx"
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Content-Transfer-Encoding: binary

t4U«¨ªÚéãkü@×&™ì¾i õO,îã\s8VŸÞ9›(*Eâø3Ï™eJÅ>q~½¾Lcg

--Y-BFLHmahNxCutH0ijbUpI9_csBSF9fVt672JxIh--

Multipart/form-data MIME to JSON output document example

{
  "FirstName" : {
    "value" : "John",
    "Content-Type" : "text/plain charset=ISO-8859-1",
    "Content-Transfer-Encoding" : "8bit",
    "type" : "data"
    },
  "LastName" : {
    "value" : "Doe",
    "Content-Type" : "text/plain",
    "Content-Transfer-Encoding" : "8bit",
    "type" : "data"
    },
  "Email" : {
    "value" : "johndoe@boomi.com",
    "Content-Type" : "text/plain charset=ISO-8859-1",
    "Content-Transfer-Encoding" : "8bit",
    "type" : "data"
    },
  "Phone" : {
    "value" : "6105555555",
    "Content-Type" : "text/plain charset=ISO-8859-1",
    "Content-Transfer-Encoding" : "8bit",
    "type" : "data"
    },
  "resumeAttachment" : {
    "value" : "resume.docx",
    "Content-Type" : "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    "Content-Transfer-Encoding" : "binary",
    "type" : "key"
    }
}

PGP Encrypt

Encrypt a file based on the desired security setting and certificate.

Security Setting — Encrypt, Sign, or Sign and Encrypt.
Encrypt Cert — Select a Certificate component for encrypting.
Signing Cert — Select a Certificate component for signing.
Clear Text — For a Security Setting of Sign, controls whether PGP messages are processed as readable text.

You can also use a document property to set the file name for the PGP-encrypted file.

PGP Decrypt

Decrypt a file based on the defined certificates.

Decrypt Cert — Select a Certificate component for decrypting.
Signing Cert — Select a Certificate component for signing. The Enforce Strict Signed Check setting controls whether unsigned documents should be accepted.

You can also get the file name and put it in a document property.

XSLT Transformation

Transform and process an input XML document into another output XML document using a user-defined XSLT stylesheet.

The XSLT Transformation step allows you to either reuse an existing XSLT Stylesheet component or create a new one containing your XSLT script. During data processing, this script is executed to transform incoming XML documents.

XSLT stylesheets can use dynamic document properties to set and retrieve variable values, enabling the exchange of runtime data with XSLT transformations for use within the transformation or by subsequent steps. For more information, refer to XSLT Transformation step: working with dynamic document properties.

note

When you copy a Data Process step with the Recommended Security Settings option unchecked, the copied process also keeps the option unchecked as its XML retains the original configuration.

Zip

Zip incoming data that is in the WinZip compression format. The incoming data is zipped and the documents are passed through the Data Process step into the process. You can use a document property to set the zipped file’s file name.

Unzip

Unzip incoming data that is in the WinZip compression format. The incoming data is unzipped and the documents are passed through the Data Process step into the process. For the file to be unzipped, you can get the zipped file’s file name and put it in a document property.